Machine learning in the prediction of post-stroke cognitive impairment: a systematic review and meta-analysis

Objective Cognitive impairment is a detrimental complication of stroke that compromises the quality of life of the patients and poses a huge burden on society. Due to the lack of effective early prediction tools in clinical practice, many researchers have introduced machine learning (ML) into the prediction of post-stroke cognitive impairment (PSCI). However, the mathematical models for ML are diverse, and their accuracy remains highly contentious. Therefore, this study aimed to examine the efficiency of ML in the prediction of PSCI. Methods Relevant articles were retrieved from Cochrane, Embase, PubMed, and Web of Science from the inception of each database to 5 December 2022. Study quality was evaluated by PROBAST, and c-index, sensitivity, specificity, and overall accuracy of the prediction models were meta-analyzed. Results A total of 21 articles involving 7,822 stroke patients (2,876 with PSCI) were included. The main modeling variables comprised age, gender, education level, stroke history, stroke severity, lesion volume, lesion site, stroke subtype, white matter hyperintensity (WMH), and vascular risk factors. The prediction models used were prediction nomograms constructed based on logistic regression. The pooled c-index, sensitivity, and specificity were 0.82 (95% CI 0.77–0.87), 0.77 (95% CI 0.72–0.80), and 0.80 (95% CI 0.71–0.86) in the training set, and 0.82 (95% CI 0.77–0.87), 0.82 (95% CI 0.70–0.90), and 0.80 (95% CI 0.68–0.82) in the validation set, respectively. Conclusion ML is a potential tool for predicting PSCI and may be used to develop simple clinical scoring scales for subsequent clinical use. Systematic Review Registration https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=383476.


Introduction
Stroke is a serious condition and a leading cause of death and long-term disability, which places a huge burden worldwide (1). Post-stroke cognitive impairment (PSCI) is a prevalent prognosis and cause of death following a stroke. Stroke patients have a higher incidence of 1-year cognitive impairment than non-stroke populations (2, 3). As society and economy progress, more emphasis is placed on disease and health, especially cognitive impairment. Early identification and diagnosis of PSCI, as well as early prophylaxis and treatment, can help improve stroke patient's prognosis and reduce social and economic burdens.
Clinical tools for early PSCI diagnosis in stroke patients are currently lacking. Researchers have tried to apply existing cognitive impairment risk prediction models constructed based on the general population to the prediction of PSCI, but their predictive performance was not ideal in stroke patients (4). As a result, researchers have shifted their focus to machine learning (ML) in the hopes of developing more accurate PSCI prediction models. ML is an emerging field in medicine that utilizes computer science and statistics to solve healthcare problems (5). In recent years, ML has been increasingly applied to stroke research, and it was shown that ML-based stroke image prediction can outperform existing prediction tools (6). However, the diversity in mathematical modeling and sensitivity of ML algorithms to factors such as patient sampling, missing data and sample size continue to fuel debates over the accuracy of these models in disease prediction.
The performance of existing stroke prediction models has been inconsistent due to the use of different types of ML (e.g., logistic regression or other alternative) and modeling variables. In these predictive models, there are differences in the types of machine learning utilized, with most researchers using logistic regression while some may consider it lacking and opt for alternative models. Furthermore, we note discrepancies in the selection of modeling variables, which ultimately contributes to the uncertainty of their results. Unfortunately, evidence-based studies investigating the efficiency of ML in the prediction of PSCI are still lacking. As a result, the aim of this study is to examine the predictive accuracy of ML in PSCI and comprehensively summarize the modeling variables included in this prediction model in order to provide a useful reference for the subsequent development of simple clinical prediction tools.

Materials and methods
This study was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines (Supplementary material) (7).
This study has been registered in PROSPERO (CRD42022383476).

Eligibility criteria 2.1.1. Inclusion criteria
(1) Patients diagnosed with ischemic stroke or hemorrhagic stroke.
(2) Only risk factor analysis was performed and lacks a complete ML model. (3) Missing outcome measures (ROC, c-statistic, c-index,  sensitivity, specificity, accuracy, recovery rate, accuracy rate,  confusion matrix, diagnosis table, F1 score, and  calibration curve). (4) Validation of the maturity scale only. (5) Study on the accuracy of single-factor prediction models.

Search strategy
Relevant articles were systematically searched in Cochrane, Embase, PubMed, and Web of Science from the inception of each database to 5 December 2022 using MeSH and entry terms without restriction on language or region. The detailed retrieval process is outlined in Supplementary material.

Literature screening and data extraction
Retrieved articles were imported into Endnote for management, and duplications were deleted. The titles and abstracts were screened to exclude irrelevant studies, and the full texts of the remaining records were downloaded and checked for eligibility. Data were collected from the included studies using a customized data extraction form. The collected data comprised title, first author, year of publication, author country, type of study, source of patient, type of stroke, diagnostic criteria for cognitive impairment, length of follow-up, number of PSCI cases, total subject number, training set, validation set, type of model used, imputation method for missing value, variable screening, and modeling variables. Two independent researchers (YY and HY) performed the literature screening and data extraction, and subsequently cross-checked their results. Any disagreement was resolved by a third researcher (XSL).

Risk of bias assessment
The Prediction model Risk Of Bias Assessment Tool (PROBAST) was employed to evaluate the quality of the included studies. The PROBAST consists of four domains, namely participants, predictors, outcome, and analysis (8). The four domains contain 2, 3, 6, and 9 specific questions, respectively. Each question has three options: yes/ probably yes (Y/PY), no/probably no (N/PN), and no information (NI). If a domain has at least one N/PN, it is rated as high risk. To be graded as low risk, a given domain must have Y/PY for all questions. When all domains are at low risk, the overall risk of bias is low; alternatively, when at least one domain is assessed as high risk, the overall risk of bias is high (9). Two researchers (XSL and DDY) independently evaluated the risk of bias in the included studies and subsequently cross-checked their results. Any disagreement was resolved by a third researcher (BYW).

Outcome measures
The primary outcome was the C-index, which can be used to reflect the overall accuracy of ML models. However, this indicator Frontiers in Neurology 03 frontiersin.org alone may not fully reflect the predictive accuracy of ML models in PSCI because the percentage of PSCI patients and non-PSCI patients in the included literature is severely unbalanced. Therefore, sensitivity and specificity were included as complementary outcome measures to evaluate the predictive accuracy of ML in PSCI.

Data synthesis and statistical analysis
The c-index and accuracy of ML models were meta-analyzed. If a 95% confidence interval (CI) and standard error were missing for the c-index, they were estimated using the methods by Debray (10). Given the differences in modeling variables and parameters, the c-index was pooled using a random effects model while sensitivity and specificity were pooled by a bivariate mixed effects model. In systematic reviews based on machine learning, heterogeneity is difficult to avoid. According to the Cochrane tool, percentages of around 25% (I 2 = 25), 50% (I 2 = 50), and 75% (I 2 = 75) are deemed to represent low, medium, and high levels of heterogeneity, respectively (11). The sensitivity and robustness of the results were evaluated using the leave-one-out method. Publication bias was qualitatively assessed using a funnel plot and quantitatively assessed by Egger's regression test (value of p). All meta-analyses were conducted in R4.2.0 (R development Core Team, Vienna, http://www.R-project.org). A p < 0.05 was considered statistically significant.

Study selection
The literature screening process is illustrated in Figure 1. We identified a total of 5, 053 unique records. After reviewing the full texts of 41 reports, 21 studies were ultimately included (12-32).

Risk of bias assessment
The high risk of bias in the included studies was attributed to the limited sample size, retrospective cohort study, and lack of validation set. Therefore, these attributes should be improved in subsequent model construction. The results of the risk of bias assessment are summarized in Figure 2. The follow-up period or the meta-regression based on study design showed that there were no significant differences in the c-index between the training and validation sets, even considering the variations due to different study designs or changes in follow-up time (Figures 7-10; Tables 3, 4).

Sensitivity analysis and publication bias
Sensitivity analysis indicated that the results of both the training and validation sets were robust (Supplementary Figures S1, S2). However, the asymmetry in the funnel plot and the results of Egger's regression test suggest that publication bias may be present in the training set (p = 0.056 for Egger's regression test), and publication bias is clearly present in the validation set (p = 0.005 for Egger's regression test; Supplementary Figures S3-S6). There were fewer independent Frontiers in Neurology 04 frontiersin.org validation cohorts in the included literature, and the presence of multiple independent validation cohorts in the same study may have contributed to publication bias.

Discussion
Our meta-analysis of 21 original studies demonstrated that ML may be an ideal tool for predicting PSCI. The training set had a c-index of 0.82 (95% CI 0.77-0.87) and sensitivity and specificity of >70%, indicating considerable predictive accuracy in PSCI. Furthermore, the accuracy of the validation set was not significantly lower than that of the training set, indicating that the ML model has good applicability. Currently, LR is the preferred model in clinical practice because it is simple for generating highly accessible nomograms, such as the nomogram on lymph node metastasis developed by the Sloan-Kettering Cancer Center (39)(40)(41). In our study, LR was also the preferred model among researchers as it exhibited comparable c-index performance to other ml algorithms while achieving higher sensitivity and specificity. As a result, we conclude that LR demonstrates satisfactory predictive ability for PSCI in this study.
We found that LR is the primary type of model utilized for predicting stroke. LR is a classification algorithm that aims to establish the relationship between features and probability of specific outcomes (42). ML is commonly used to address issues encountered in clinical practice, with supervised learning and unsupervised learning being the most common approaches. Supervised learning primarily focuses on diagnosing and predicting disease prognosis or progression, which involves the process of training, validation, and testing. The training process involves inserting predictive factors into the model and using the model's inherent parameter calculation rules (e.g., maximum likelihood estimation, iteration) to estimate the optimal model parameters. Selection of modeling variables (feature selection methods) is crucial for the training process and has been a subject of ongoing debate due to their diversification. Furthermore, validation and testing are crucial for a completed model as they reflect the model's robustness. Unfortunately, in actual research, most studies lacked effective external validation. The original studies included in our analysis predominantly utilized a supervised ML process with Flowchart of study selection.     single-factor + multi-factor LR model selection method and performed internal validation through random sampling (43)(44)(45).
In our study, the c-index of LR did not significantly lag behind other types of ML models, which demonstrates relatively high sensitivity and specificity. Hence, we believe that LR exhibits promising predictive potential for PSCI.
In addition, we found that the major modeling variables for the ML-based PSCI prediction models were age, gender, education level, white matter hyperintensity (WMH), stroke history, stroke severity, lesion volume, lesion site, stroke subtype, and vascular risk factors. These modeling variables were still mainly based on past identified risk factors (race, age, gender, education level, vascular risk factor, stroke severity, and stroke lesion site and volume) (46), and very few or no newly identified risk factors were used for modeling, such as blood proteins [homocysteine (Hcy), C-reactive protein (CRP), low-density lipoprotein cholesterol (LDL-C), total cholesterol (TC)] that have been recognized as effective biomarkers for PSCI (47), cognitive reserve (CR) (48), activity and participation of stroke survivors (49), and intestinal dysbiosis (50). Therefore, the newly identified risk factors should be prioritized for further validation as their efficacy as modeling variables remains uncertain.
It was reported that common cognitive screening tools have similar predictive accuracy in PSCI. Although the MoCA has significantly better sensitivity in PSCI prediction than other cognitive screening tools, its specificity is less than desirable (51,52). This demonstrates that there is a lack of effective prediction models for the early screening of PSCI. However, our findings showed that ML has considerably high predictive accuracy (c-index, sensitivity, and specificity) in PSCI and is a promising tool for predicting PSCI.
A recent systematic review indicated that although PSCI has unique risk factors (e.g., Vascular risk factors, lifestyle, overweight, and obesity), it is currently unclear whether the intervention of these risk factors can effectively reduce the incidence of PSCI. Most approaches for lowering PSCI incidence are still dependent on effective prophylaxis for stroke (46). Therefore, effective prediction tools for the early identification and diagnosis of PSCI are urgently needed. Despite the uncertainty in the intervention measures for poststroke cognitive functions, some researchers found that physical Risk of bias assessment.
Frontiers in Neurology 09 frontiersin.org activity intervention and noninvasive brain stimulation can improve post-stroke cognitive functions compared with conventional care (53). Though, the ≥2-year improvement in PSCI after intervention was small (54). Moreover, patients with cognitive impairment have significantly increased risks of subsequent ischemic and fatal stroke (55,56). Hence, early identification of appropriate treatment and rehabilitation measures are critical for improving the health and life expectancy of PSCI patients. Our study demonstrated the feasibility of ML in the development of PSCI prediction tools and that ML is also an important means for PSCI prediction. Given the low number of PSCI prediction models for hemorrhagic stroke included in this study (n = 3) (19), the predictive accuracy of ML vs. common cognitive screening tools in PSCI in hemorrhagic stroke patients remains unclear and warrants further investigation.
ML plays an important role in the clinical management of stroke and improvement of the accuracy and efficiency of stroke prediction, diagnosis, personalized treatment, and prognosis assessment (57). For prediction of stroke risk, ML algorithms can be trained using patient data to establish predictive models and estimate the risk of stroke based on individual patient information, clinical indicators, and biomarkers. As for stroke diagnosis, ML can learn and identify radiological features of stroke and assist physicians with early and accurate diagnosis. In addition, ML can predict the efficacy and safety of different treatment options based on the patient's personal information, medical history, and clinical manifestations, enabling physicians to develop personalized treatment strategies. Furthermore, ML algorithms can predict post-stroke recovery and long-term prognosis based on patients' clinical and biomarker data. ML has been extensively used in stroke diagnosis, particularly in brain imaging, with SVM being the optimal model for stroke imaging (6,44,57). However, in our study, SVM exhibited inferior sensitivity to LR despite higher c-index, and the model size was limited (n = 1). Therefore, further exploration and development of SVM in predicting PSCI are warranted. We can attempt to optimize the accuracy of PSCI prediction by using different SVM models and parameter settings. SVM has various variants, such as non-linear SVM, multi-kernel SVM, and support vector regression, which are selected based on specific circumstances. Additionally, the combination of SVM with other ML methods can be explored for PSCI prediction. For instance, integrating SVM with deep learning techniques can improve the accuracy and robustness of predictions when analyzing images or text data. Moreover, extensive clinical validation studies are required to assess the actual effectiveness of SVM in PSCI prediction. The application value of SVM in PSCI prediction can be comprehensively assessed by collecting more data from PSCI patients and evaluating the models on independent  Forest plot of sensitivity and specificity in the training set. LR, logistic regression; SVM, Support Vector Machines; DT, decision trees; MEM, mixed effects model. Forest plot of c-index in the validation set. LR, logistic regression; LASSO, least absolute shrinkage and selection operator.
Frontiers in Neurology 11 frontiersin.org validation sets. In conclusion, SVM, as a widely used ML method, has untapped potential in PSCI prediction. Continuous learning and research efforts can further refine and optimize the application of SVM in PSCI prediction, providing more accurate diagnostic and treatment decision support for clinical practitioners.
For this systematic review, the literature search was performed up until December 2022 and additional studies on this topic may become available subsequently. Hence, a regular review of the literature is recommended to obtain the most updated progress on this research topic. Forest plot of sensitivity and specificity in the validation set. LR, logistic regression; LASSO, least absolute shrinkage and selection operator. Meta-regression bubble plot of follow-up time in the training set (circles represent weights, with larger circle indicating greater weight and smaller confidence interval).

Strengths and limitations
This systematic review is the first to demonstrate the feasibility of ML in PSCI prediction. The included models were highly consistent and were predominantly logistic regression nomograms, which minimized heterogeneity.
Despite a comprehensive literature search, the number of included studies and models was still relatively low, and bias may be present in model construction.

Conclusion
ML has considerable predictive accuracy and is a promising prediction tool for PSCI. Therefore, future studies should concentrate on constructing ML models based on multi-racial, multi-center, and large-cohort samples and transforming them into simple clinical scoring tools with wide application. This will Meta-regression bubble plot of design in the training set (circles represent weights, with larger circle indicating greater weight and smaller confidence interval. (1) Randomized controlled trial; (2) Prospective cohort study; (3) Retrospective cohort study).

FIGURE 10
Meta-regression bubble plot of design in the validation set (circles represent weights, with larger circle indicating greater weight and smaller confidence interval).

Data availability statement
The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding authors. Sensitivity analysis of the training set.

SUPPLEMENTARY FIGURE S2
Sensitivity analysis of the validation set.

SUPPLEMENTARY FIGURE S3
The funnel plot of the training set.

SUPPLEMENTARY FIGURE S4
The funnel plot of the validation set.

SUPPLEMENTARY FIGURE S5
Egger's regression test of the training set.

SUPPLEMENTARY FIGURE S6
Egger's regression test of the validation set.